FFmpeg Basics 笔记

FFmpeg Fundamentals

FFmpeg Introduction

Developers of FFmpeg

Participation in FFmpeg development

FFmpeg download

Command line syntax

Windows Command Prompt and its alternatives

Path Setting

Renaming to shortened form

Displaying output preview

SI prefixes available in FFmpeg

Using lavfi

Intros

reference

Displaying Help and Features

BitRate, Frame Rate and FileSize

Frame rate setting

Bit rate Introduction

设置码率

CBR

Setting Maximum size of output file

file size calculation

Resizing and Scaling Video

Resizing and Scaling Video

补充：Which one is correct when describing the screen resolution of a smartphone, 1920x1080 or 1080x1920?

Resizing video

Considerations when resizeng Nyquist 定理

special Enlarging filter

Advanced scaling

cropping video

cropping frame center

automatic detection of cropping area

cropping of timer

padding video

padding basics

padding videos from 4:3 to 16:9

padding from and to various aspect ratios

flipping and rotating video

Blur, Sharpen and Other Denosing

Blur video effect

noise reductino with hqdn3d

Overlay picture in picture

Adding Text on Video

Introduction

Dynamic text

Conversion Between Formats

transcoding and conversion

time operations

delay between input streams

limit for processing time

shortest stream determins encoding time

timestamp and time bases

encoding timebase setting

audio and video speed modifications

audio speed change

synchronizing audio data with timestamps

Mathematical Functions

metadata and subtitles

Introduction to metadata

saving and loading metadata to file

subtitle

Image Processing

creating images

screenshots from videos

video conversion to images

resizing, cropping and padding images

flipping rotating and overlaying

conversion between image types

creating video from images

creating video from many images

Digital Audio

introduction

audio file formats

sound synthesis

stereo and more complex sounds

sound volume setting

multiple sounds mixed to one output

downmixing stereo to mono, surround to stereo

simple audio analyzer

adjusting audio for listening with headphones

audio modifications with -map_channel option

merging 2 audio streams to 1 multichannel stream

audio stream forwarding with buffer order control

persets for codecs

introduction

Interlaced video

NTSC PAL SECAM TV standards

PAL

NTSC

扫描线

各种 fps 总结

Interlaced frame type setting

field order

deinterlacing

ffmpeg components and projects

ffplay introduction

ffprobe introduction

ffprobe 相关用法

ffserver introduction

ffmpeg software libraries

Microphone and Webcam

Introduction

batch files

color corrections

Advanced Techniques

joining audio and video files

去水印

fixing of shaking video parts

add color box to video

number of frames detection

detection of ads section or currupted encoding

selecting specified frames to output

FFmpeg Basics 笔记

FFmpeg Fundamentals

FF fast forward 快进
mpeg(moving picture experts group)

FFmpeg Introduction

command-line tools:

ffmpeg
ffplay
ffprobe
ffserver

ffmpeg software libraries

libavcodec
libavdevice
libavfilter
libavformat
libavutil
libpostproc 后处理
libswresample: audio resampling
libswscale: mediascaling

Developers of FFmpeg

Participation in FFmpeg development

FFmpeg download

Command line syntax

ffmpeg [global options] [input options] -i input_1 [input options] -i input_2 [output_options] output1 [output options] output2

输入输出可以是 file/pipe/stream/device

Windows Command Prompt and its alternatives

Path Setting

Renaming to shortened form

Displaying output preview

ffplay
SDL

SI prefixes available in FFmpeg

K kilo
M mega
G giga

Using lavfi

lavfi (short for libavfilter)

Intros

decode
encode
transcode
mux:
demu：x 比如将视频分出视频、音频等等
stream
filter
play ffplay
libavfilter: 滤镜子系统

reference

ffmpeg git addr

Displaying Help and Features

configure 是编译 ffmpeg 的选项，用--list-可以查看选项支持情况
ffmpeg [global options] [[infile options][‘-i’ infile]]... {[outfile options] outfile}...

显示信息

ffmpeg -i h1.mp4 ## 显示mp4的信息
ffmpeg -i -hide_banner h1.mp4
## -hide_banner 仅仅显示媒体文件信息，去掉ffmpeg细节信息 All FFmpeg tools will normally show a copyright notice, build options and library versions. This option can be used to suppress printing this information.
ffmpeg -b ## bitrate
ffmpeg -formats ## 查看ffmpeg是否支持对应的视频文件
ffmpeg -codecs
Codecs:
D..... = Decoding supported
.E.... = Encoding supported
..V... = Video codec
..A... = Audio codec
..S... = Subtitle codec
...I.. = Intra frame-only codec
....L. = Lossy compression
.....S = Lossless compression
-y (global)
Overwrite output files without asking.

查看 ffmpeg 当前版本支持的所有解码器

在源码根目录执行：./configure --list-decoders

BitRate, Frame Rate and FileSize

Frame rate setting

-r 30 表示每一秒 30 帧
-vf fps=fps=25

如果比原来帧数多，就会 duplicate，否则会 dropping

ffmpeg -i input.avi -r 30 output.mp4
ffmpeg -i input.avi -vf fps=fps=30

Bit rate Introduction

参考码率控制

设置码率

-b -b:v -b:a指定视频或音频

ffmpeg -i file.avi -b 1.5M film.mp4
ffmpeg -i file.avi -b:v 1500k output.mp4 ## 只设置video

CBR

-b minrate -maxrate 这三个的值相同。-bufsize

ffmpeg -i in.avi -b 0.5M -minrate 0.5M -maxrate 0.5M -bufsize 1M output.mkv

Setting Maximum size of output file

-fs file size

ffmpeg -i input.avi -fs 10MB output.mp4

file size calculation

$video_size=video_bitrate * time_in_sec / 8$
$audio_size_uncompress=sample_rate*bit_depth*channels*time_in_sec/8$
ParseError: KaTeX parse error: Undefined control sequence: \* at position 37: …=audio_bitrate \̲*̲ time_in_sec / …

在数字音频与脉冲编码调变中，音频位深度是指每次采样存储着多少比特的信息，数值直接对应着每次采样的分辨率。比如，数字音乐光盘采用 16 位存储采样，则每个采样点可以存储 65,536 种可能振幅值之一；DVD-A 与蓝光光盘则最高可支持 24 位，即每个采样点最多可以存储 16,777,216 种可能振幅值之一。

4. Resizing and Scaling Video

Resizing and Scaling Video

补充：Which one is correct when describing the screen resolution of a smartphone, 1920x1080 or 1080x1920?

Both are correct

Resizing video

ffmpeg -i input -s 320x240 output_file

Considerations when resizeng Nyquist 定理

简单来说：如果采样周期的整数倍时间，无法检测到相位的变化。

采样周期要小于整数周期的 1/2，采样频率大于原始频率的 2 倍。

QQVGA 小于 2 像素的都会不可见。至少 3 像素可见。

special Enlarging filter

super2xsai: 宽高均*2

ffmpeg -i input -vf super2xsai output

Advanced scaling

-vf scale=宽：高

-vf scale=iw/2:ih/2
-vf scale=iw*0.9:ih*0.9
## 等比例缩放
-vf scale=400:400/a
-vf scale=300*a:300

cropping video

截取画面中的一部分 select a wanted rectangular area

crop=ow:oh:x:y:keep_aspect

ow oh 字面意思
xy 是从坐上位中心的坐标位置。

ffmpeg -i input -vf crop=iw/3:ih:0:0 output
ffmpeg -i input -vf crop=iw/3:ih:iw/3:0 output
ffmpeg -i input -vf crop=iw/3:ih:iw/3*2:0 output

cropping frame center

xy 默认值是(i-o)/2

ffmpeg -i input -vf crop=w:h output_file

automatic detection of cropping area

用来裁掉黑色边框的。

ffmpeg -i input -vf cropdetect=limit=0 output

0 is complete black

cropping of timer

padding video

增加额外的空白面积

padding basics

-vf pad=width:height:x:y:color
-vf pad=860:660:30:30:pink

padding videos from 4:3 to 16:9

ffmpeg -i input -vf pad=ih*16/9:ih:(ow-iw)/2:0 film_wide.avi
ffmpeg -i input -vf pad=iw:iw*3/4:0:(oh-ih)/2:color output

padding from and to various aspect ratios

ffmpeg -i input -vf pad=ih*ar:ih(ow-iw)/2:0:color output

flipping and rotating video

-vf hflip 水平左右翻转
-vf vflip 上下翻转

-vf transpose=0
0 逆时针 90 度，然后上下翻转
1 顺时针 90 度
2 逆时针 90 度
3 顺时针 90 度，然后上下翻转

Blur, Sharpen and Other Denosing

Blur video effect

noise reductino with hqdn3d

-vf hqdn3d

high quality denoise 3-dimensional

-nr noise redution

Overlay picture in picture

ffmpeg -i input -i input_2 -filter_complex overlay=x:y output
ffmpeg -i input -i input_2 -filter_complex overlay output ## 左上角
ffmpeg -i input -i input_2 -filter_complex overlay=W-w output ## 右上角
ffmpeg -i input -i input_2 -filter_complex overlay=W-w:H-h output ## 右下角
ffmpeg -i input -i input_2 -filter_complex overlay=0:H-h output ## 左下角

以左上角为中心，添加 logo 用得比较多。

Adding Text on Video

Introduction

drawtext=fontfile=font_f=text=text1

ffplay -f lavfi -i color=c=white -vf drawtext=fontfile=/usr/share/fonts/TTF:text=Welcome
ffplay -f lavfi -i color=c=white -vf drawtext="fontfile=/usr/share/fonts/TTF:text='Welcome':x=(w-tw)/2:y=(h-th)/2"
ffplay -f lavfi -i color=c=white -vf drawtext="fontfile=/usr/share/fonts/TTF:text='Welcome':x=(w-tw)/2:y=(h-th)/2:fontcolor=green:fontsize=30"

Dynamic text

ffmpeg -f lavfi -i color=c=white -vf drawtext="fontfile=/usr/share/fonts/TTF:text='Welcome':x=w-t*50"

Conversion Between Formats

ffmpeg -formats

container change, codec remain

ffmpeg -i input -q 1 -c copy output.mpv

transcoding and conversion

-c copy codec copy

ffmpeg -codecs
ffmepg -decoders
ffmepg -encoders

-q -qscale keep same quality

time operations

-t 设置时间，单位秒数

ffmpeg -i music.mp3 -t 180 music_3.mp3

-vframes -aframes

ffmpeg -i video -vframes 25000 output.mp4

-ss seek from start

ffmpeg -i input -ss 10 output

delay between input streams

-map select particular stream.

-itsoffset input timestamp offset

话音不同步

假设视频快 5 秒，用 command 矫正：

ffmpeg -i input -map 0:v -itsoffset 5 -map 0:a -c:a copy -c:v copy output

limit for processing time

ffmpeg -i input -timelimit 600 output
限制编码 600 后结束

shortest stream determins encoding time

ffmpeg -i video.avi -i audio.mp3 -shortest output.mp4

timestamp and time bases

tbn：对应容器中的时间基。值是 AVStream.time_base 的倒数
tbc：对应编解码器中的时间基。值是 AVCodecContext.time_base 的倒数
tbr：从视频流中猜算得到，可能是帧率或场率(帧率的 2 倍)

encoding timebase setting

copytb specify an encoder timebase for the stream copy

The "-c copy" means it will just copy the audio and video tracks without re-encoding them.
The "-copyts" flag means it will copy timestamps, which should help with syncing audio and video.
å
-c copy is used to copy all video/audio/subtitles streams
-copyts copy timestamps from original source (does not create new timestamps)
Are there Pros/ Cons to "Copy Timestamps" vs "Create New Timestaps"?
-copytb copy timebase which is for example, the same thing as your timeline timebase

audio and video speed modifications

setpts=expresssion

ffplay -i input -vf setpts=PTS/3

audio speed change

atempo=

二倍速

ffplay -i speech.mp4 -af atempo=2

synchronizing audio data with timestamps

asyncts=param

ffmpeg -i input -af asyncts=compensate=1 -f mpegts music.ts

Mathematical Functions

metadata and subtitles

Introduction to metadata

ffmpeg -i input -metadata artist=FFmpeg -metadata title="Test 1" output.mp4

metadata 是 kv 对，asf/flv/matroska/wma/wmv 支持各种 metadata，就是输入啥都行。

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'output.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2mp41
    title           : test 1
    artist          : FFmpeg
    encoder         : Lavf58.65.101
  Duration: 00:07:37.02, start: 0.000000, bitrate: 1571 kb/s
    Stream #0:0(und): Video: mpeg4 (Simple Profile) (mp4v / 0x7634706D), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], 1439 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 24k tbc (default)
    Metadata:
      handler_name    : VideoByEZMediaEditor
      vendor_id       : [0][0][0][0]
    Stream #0:1(jpn): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : Audio1-jpn
      vendor_id       : [0][0][0][0]
At least one output file must be specified

saving and loading metadata to file

save

ffmpeg -i input -f ffmetadata data.txt

load

ffmpeg -i data.txt -i input.mp4 output.mp4

subtitle

srt 格式

也有不同的 codec

Image Processing

creating images

screenshots from videos

在第 180 秒截图一张

ffmpeg -i input -ss 180 image.type

这里 pixel format 必须 rgb24

ffmpeg -i input -pix_fmt rgb24 res.gif

创建一个纯色图片用 color 参数

video conversion to images

ffmpeg -i clip.avi frame%d.jpg ## 1 2 3 4 ...112 223 ...
ffmpeg -i clip.avi frame%4d.jpg ## 0001 0002 0003 0004

resizing, cropping and padding images

同 video

flipping rotating and overlaying

hflip vflip，同 video

conversion between image types

ffmpeg -i image.type1 image.type2

creating video from images

ffmpeg -loop 1 -i photo.jpg -t 10 photo.mp4

creating video from many images

ffmpeg -f image2 -i img%d.jpg -r 25 video.mp4

Digital Audio

introduction

audio file formats

sound synthesis

stereo and more complex sounds

sound volume setting

-volume=vol

-volume=10db

-volume=2/3

multiple sounds mixed to one output

amix=inputs=4:dropout_transition=5

downmixing stereo to mono, surround to stereo

pan=

simple audio analyzer

-af ashowinfo

ffmpeg -report -i 264.mp3 -af ashowinfo -f null /dev/null

[Parsed_ashowinfo_0 @ 0x7fe1cbd1a2c0] n:41230 pts:47495855 pts_time:1077 pos:17232841 fmt:fltp channels:2 chlayout:stereo rate:44100 nb_samples:1152 checksum:58E5F619 plane_checksums: [ 2015F984 E98DFC86 ]
[Parsed_ashowinfo_0 @ 0x7fe1cbd1a2c0] n:41231 pts:47497007 pts_time:1077.03 pos:17233259 fmt:fltp channels:2 chlayout:stereo rate:44100 nb_samples:1152 checksum:8E980E93 plane_checksums: [ 85F6FE29 1A82105B ]
[Parsed_ashowinfo_0 @ 0x7fe1cbd1a2c0] n:41232 pts:47498159 pts_time:1077.06 pos:17233677 fmt:fltp channels:2 chlayout:stereo rate:44100 nb_samples:1152 checksum:07338488 plane_checksums: [ 166FBEB5 6D8EC5C4 ]
[Parsed_ashowinfo_0 @ 0x7fe1cbd1a2c0] n:41233 pts:47499311 pts_time:1077.08 pos:17234095 fmt:fltp channels:2 chlayout:stereo rate:44100 nb_samples:1152 checksum:88DF8EF9 plane_checksums: [ AE7FC290 ED10CC5A ]
[Parsed_ashowinfo_0 @ 0x7fe1cbd1a2c0] n:41234 pts:47500463 pts_time:1077.11 pos:17234513 fmt:fltp channels:2 chlayout:stereo rate:44100 nb_samples:1152 checksum:A6B1ACD7 plane_checksums: [ B158BEC9 0A14EDFF ]

adjusting audio for listening with headphones

-af earwax

audio modifications with -map_channel option

merging 2 audio streams to 1 multichannel stream

ffmpeg -i moni1.mp3 -af amovie=moni2.mp3[2];[in][2]amerge stereo.mp3

audio stream forwarding with buffer order control

astreamsync

persets for codecs

introduction

preset files: 文本文件，用于容器化，包含选项，主要是 kv 对。

-fpre mpeg2.ffpreset

Interlaced video

场（Field）：是电视成像中的单位，电视成像还有摄像机采集都是通过隔行扫描来实现。先扫描奇数行，再扫描偶数行，这么做的原因是，如果一次扫描所有行，过程消耗的时间比较长，电视屏幕上的荧光衰减会造成人眼视觉的闪烁感。注意：将相邻的场拼起来并不能和帧完全相同。因为存在着时间差，比如 PAL 规定中，一场的时间大概是 1/50 秒。场一般是帧数的 2 倍。

interlace scan：隔行扫描，将图像显示在扫描式的显示设备上的方法，比如显示在阴极射线管（CRT），传统电视。隔行扫描：实际上电视所使用的扫描方式为“间条式 interlance”或称交错式的扫描，此一方法首先扫描 1，3，5，7……等奇数扫描线以构成第一个图场（field），然后再扫描 2，4，6，8……等。偶数扫描线以构成第二个图场（field），两个图场构成一个图框，如下图所示，以 NTSC 电视系统而言，每一秒可产生 30 个图框（60 个图场），等效上，相当于每秒出现 60 桢的画像，可以减少画面的闪烁（flicker）现象，并节省一半的频率。
progressive scan：逐行扫描，在显示设备表示运动图像的方法，这种方法将每帧的所有像素同时显示。
deinterlacing：将隔行扫描的影像信号转换为逐行扫描影像信号的一种方法。

帧：和场一样，都是用来形成画面的。电影用帧比较多，一般人眼在 24 帧以上就会感受到连续画面。

电脑的萤幕为非交错式扫描，因此其使用之扫描频率比较高。

NTSC PAL SECAM TV standards

制式。

PAL

Phase Alternating Line (PAL) is a colour encoding system for analogue television used in broadcast television systems in most countries broadcasting at 625-line / 50 field per second (576i).

欧洲制式，中国也是采用的这种制式，帧率 25 fps，即 50i. 电影工业中，因为人眼对 24 fps 的图像即可认为是连续的，电影成本胶卷成本也是成本，所以一般就固定 24 fps 了。

NTSC

The abbreviation NTSC can refer to the National Television System Committee, which developed the analog television color system that was introduced in North America in 1954 and stayed in use until digital conversion.

主要是美国的信号制式，帧率 30 fps，实际传输中，一帧 frame 会分为 2 场来表示，也就是是 60i，传输时，信号 60Hz。

每秒 60/1.001 场，扫描线

扫描线

一个图框的扫描线数是 525 条或 625 条，美国国家电视系统委员会（National Television System Committee，NTSC）制订的电视画面播映标准，明定每幅画面必需具备五百二十五条扫描线，及每秒钟必需播映三十幅画面，而欧洲国家则有每个图框 625 条扫描线，且每秒 25 个图框的电视系统。通俗地说，我们可以把电视上的画面以水平方向分割成很多很多扫瞄线，分得越细，这些画面就越清楚，而水平线数的扫瞄线数量也就越多。清晰度的单位是“电视行（TVLine）”也称线。意思是从水平方向上看，相当于将每行扫瞄线竖立起来，然后乘上 4：3 或者 16：9 的宽高比，构成水平方向的总线数。

各种 fps 总结

23.98(24000/1001) 23.98fps 是伴随着怎样将 24 fps 的电影转为 NTSC 信号出现的。24 fps 的电影要逐帧转为 29.97 fps 的 NTSC 的信号时，需要进行一个 3:2 pull down. https://en.wikipedia.org/wiki/Three-two_pull_down 23.98fps 给 DVD、广播电视用，但是 PC 或者什么其他都会支持的，用这个最安全。
30 29.97(30000/1001)
24fps 只是电影用，pc 用，不用考虑兼容性.
24->PAL: The most popular method is to speed up the material by 1/24 (≈4.1%).

Interlaced frame type setting

field order

deinterlacing

ffmpeg components and projects

ffplay introduction

ffplay -showmode 1 x.mp3 将音频数据的波形显示出来。
ffplay -flags2 +export_mvs ~/Downloads/h1.mp4 -vf codecview=mv_type=fp 使用 ffplay 对所有帧的前向预测宏块运动向量做可视化处理
vis_mb_type 用来展示宏块，已经不用了

ffprobe introduction

ffprobe gathers information from multimedia streams and prints it in human- and machine-readable fashion.
ffprobe -show_streams h1.mp4
ffprobe -show_packets -print_format compact xxxx.flv
ffprobe -v trace -i xxx.flv

显示 pict_type

ffprobe -show_frames 好想告诉你\ -\ 第3集.MP4| grep -E 'pict_type'

ffprobe -show_frames input.bin | grep key_frame
key_frame=1
key_frame=0
key_frame=0
key_frame=0
key_frame=0
key_frame=0

从 pict_type=I 可以看出这是个关键帧，然后 key_frame=1 表示这是 IDR frame，如果 key_frame=0 表示这是 Non-IDR frame.

Frames with pict_type=I yet key_frame=0

This means although it is and I-frame, the decoder might still use/need previously sent frames for decoding. In contrast if key_frame=1 this would be an IDR-Frame (Instantaneous Decoder Refresh).

ffprobe 相关用法

ffprobe -show_packets

 [PACKET]
 codec_type=video
 stream_index=0
 pts=N/A
 pts_time=N/A
 dts=N/A
 dts_time=N/A
 duration=48000
 duration_time=0.040000
 convergence_duration=N/A
 convergence_duration_time=N/A
 size=1075
 pos=0
 flags=K_
 [/PACKET]

 [PACKET]
 codec_type=video
 stream_index=0
 pts=0
 pts_time=0.000000
 dts=-2002
 dts_time=-0.083417
 duration=1001
 duration_time=0.041708
 convergence_duration=N/A
 convergence_duration_time=N/A
 size=1051
 pos=48
 flags=K_
 [/PACKET]

flags=K_

ffserver introduction

ffmpeg software libraries

Microphone and Webcam

Introduction

batch files

color corrections

Advanced Techniques

joining audio and video files

连接两个文件

ffmpeg -i concat:"v1.mpg|v2.mpg" -c copy video.mpg

去水印

-vf delogo

fixing of shaking video parts

-vf deshake

add color box to video

-vf drawbox

number of frames detection

ffmpeg -i 264.mp4 -c copy -f null /dev/null

frame=32520 fps=0.0 q=-1.0 Lsize=N/A time=00:22:36.34 bitrate=N/A speed=4.47e+03x
video:188431kB audio:24745kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

detection of ads section or currupted encoding

blackdetect blackframe

selecting specified frames to output

select=expression

aspect ratio: 纵横比;屏幕高宽比